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REGULATING  A  STOCK  EXTERNALITY 
UNDER  UNCERTAINTY  WITH  LEARNING 


Charles  D.  Kolstad' 


ABSTRACT 

This  paper  concerns  the  problem  of  efficiently  regulating  a  stock  externality  (ie,  emissions  are 
regulated  but  the  stock  of  the  externality  causes  the  damage)  when  uncertainty  exists  and  learning 
is  taking  place  about  the  nature  of  the  externality.  The  tension  is  between  substantial  controls  on 
pollution  when  little  is  known  about  it  versus  waiting  for  more  information  before  instituting  controls. 
Acting  soon  reduces  potential  adverse  effects;  waiting  will  be  advantageous  ex  post  if  the  problem 
turns  out  to  be  less  serious  than  expected.  The  case  considered  here  is  uncertainty  in  how  the 
pollution  stock  affects  utility.   A  three-period  model  is  used  to  examine  the  question. 


'institute  for  Environmental  Studies  and  Department  of  Economics,  University  of  Illinois,  1101 
W.  Peabody,  Room  352,  Urbana,  Illinois  61801  and  Department  of  Economics,  University  of 
California,  Santa  Barbara.  Research  supported  by  a  grant  from  the  Research  Board  of  the  University 
of  Illinois  and  by  NSF  Grant  SES-91- 10325.  The  paper  has  benefitted  from  discussions  with  Charles 
Kahn  and  Henry  van  Egteren  and  from  comments  by  Geir  Asheim,  Robert  Deacon  and  seminar 
participants  at  the  University  of  Illinois,  the  Norwegian  School  of  Economics  and  CORE.  An  earlier 
version  of  this  paper  was  presented  at  the  1992  ASSA  meetings  in  New  Orleans  and  the  1991 
EAERE  meeting  in  Stockholm. 


I.   INTRODUCTION 

Uncertainty  is  a  dominant  characteristic  of  environmental  externalities.  Typically  we 
understand  well  neither  the  effects  of  these  externalities  nor  the  costs  of  controlling  them.  This  is 
one  reason  considerable  sums  are  expended  in  trying  to  better  understand  environmental  problems. 
Examples  abound:  hazardous  wastes  and  groundwater,  global  warming,  acid  rain,  species  extinction, 
pesticide  accumulation,  and  the  list  could  go  on.  An  additional  factor  frequently  comes  into  play 
having  to  do  with  the  cumulative  or  stock  effects  of  the  externality.  For  example,  it  is  not  the 
emissions  of  greenhouse  gases  that  directly  cause  adverse  effects;  rather  it  is  the  stock  of  these  gases 
that  may  lead  to  climate  change.  These  two  aspects  of  the  problem-stock  effects  and  uncertainty- 
lead  to  a  tension  between  instituting  control  and  delaying  control.  Some  in  society  will  desire  control 
of  greenhouse  gases  before  climate  change  is  well  understood.  Others  in  society  may  urge  delaying 
control  until  the  problem  is  clearly  delineated.  If,  ex  post,  the  problem  turns  out  to  be  less  severe 
than  expected  then  those  urging  delay  will  have  been  proved  correct  (ex  post).  If  on  the  other  hand, 
the  problem  turns  out  to  be  more  severe  than  expected,  then  delay  can  be  very  costly  indeed. 

This  paper  concerns  this  problem  of  when  and  to  what  extent  to  regulate  the  generation  of 
externalities  when  uncertainty  exists  and  learning  is  taking  place  about  these  externalities.  We  stylize 
the  problem,  considering  two  periods  in  which  decisions  occur,  with  autonomous  learning  between 
the  periods;  i.e.  learning  that  proceeds  with  time  without  regard  to  investments  in  the  learning 
process.  Thus  the  regulator  acts  in  the  first  period,  learns  (though  not  all  uncertainty  is  resolved)  and 
then  acts  again.  The  decision  variable  for  the  regulator  is  emissions  control,  which  is  costly. 
Emissions  accumulate  over  time.  Thus  the  tension  is  between  foregoing  current  period  consumption 
to  reduce  emissions  versus  having  higher  current  period  consumption  but  high  pollution  stocks  in  the 
future. 

Two  primary  results  emerge  from  our  analysis.  If  emissions  control  is  perfectly  reversible  (no 
sunk  capital),  then  the  faster  one  is  learning,  the  lower  current  emissions  should  be.   Thus  learning 
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and  emissions  control  are  complements,  not  substitutes.     In  the  case  where  emission  control 

investments,  once  made,  become  sunk  costs,  then  how  learning  affects  current  period  emissions  turns 

on  the  extent  to  which  learning  can  increase  the  value  of  control  capital. 

There  is  a  considerable  literature  on  the  effect  on  infomation  acquisition  on  developing 
irreplaceable  environmental  assets  (such  as  flooding  the  Grand  Canyon).  A  basic  result,  initially 
demonstrated  by  Arrow  and  Fisher  [1974],  is  that  there  is  an  information  value  in  deferring 
irreversible  actions.  Absent  from  this  literature  is  a  consideration  of  the  effect  of  the  rate  of 
information  acquisition  and  the  tension  between  irreversibilities  in  environmental  effects  and  the  sunk 
cost  nature  of  investments  to  protect  the  environment.2  This  paper  addresses  both  of  these  issues. 

The  next  section  of  the  paper  reviews  some  important  contributions  to  this  literature, 
including  quasi-option  value  and  irreversibilities  in  investment  as  well  as  the  literature  on  decision- 
making when  learning  is  taking  place.  The  subsequent  sections  presents  our  model  of  optimal 
regulation,  focusing  on  stock  effects  from  the  externality.  We  examine  the  case  of  uncertainty  in  the 
disutility  of  pollution. 
II.  BACKGROUND 
A.   Irreversibilities  and  Stock  Externalities 

Although  the  content  of  this  paper  is  new,  the  results  build  on  a  considerable  literature.  A 
major  literature  has  developed  in  the  area  of  investment  under  uncertainty  in  the  presence  of 
externalities.  Arrow  and  Fisher  [1974]  initiated  much  of  the  work  in  this  area  by  focusing  on  a  two 
period  model  with  uncertainty  about  the  benefits  of  an  environmental  asset  that  is  to  be  exploited 
(eg,  a  canyon  flooded  to  make  electricity).  With  some  uncertainty  resolved  between  the  two  periods 
and  the  impossibility  of  undoing  development  of  the  environmental  asset,  it  turns  out  to  be  optimal 


2Freeman  (1984)  and  Miller  and  Lad  (1984)  consider  the  case  where  development  yields  useful 
information.  This  is  a  different  issue.  Perhaps  the  paper  that  comes  closest  to  the  analysis  presented 
here  is  Olson  (1990). 
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to  bias  development  in  favor  of  preservation  of  the  environmental  asset.    Henry  [1974]  published 

similar  results  at  the  same  time.    In  essence,  taking  an  irreversible  action  has  a  cost  in  terms  of 

reducing  the  value  of  information.    Arrow  and  Fisher  [1974]  introduced  the  notion  of  quasi-option 

value,  the  value  of  the  information  gained  by  waiting  before  exploiting  the  environmental  asset.  Since 

then,  there  has  been  a  considerable  literature  on  irreversibilities  and  on  quasi-option  value  (eg,  see 

Fisher  and  Hanemann,  1987,  1990;  Freeman,  1984;  Olson,  1990;  Conrad  1980;  and  Miller  and  Lad, 

1984).    Of  course  there  is  a  large  literature  in  finance  on  option  value.    In  particular,  a  number  of 

recent  papers  concern  the  optimal  timing  of  capital  investments  (eg,  oil  field  development)  when 

learning  is  taking  place  (eg,  oil  field  exploration);  see  Paddock  et  al.  (1988). 

Another  related  literature,  primarily  from  the  early  1970's  concerns  optimal  growth  in  the 
presence  of  environmental  externalities,  particularly  stock  externalities.  This  was  a  natural  extension 
of  the  optimal  growth  models  that  were  popular  in  the  1960's  and  early  1970's.  An  important  and 
characteristic  paper  in  this  genre  is  that  of  Keeler  et  al  (1971).  In  that  paper  a  simple  optimal  growth 
model  is  posited  where  utility  is  a  function  of  consumption  and  a  stock  of  pollution.  Optimal  paths 
for  accumulation  of  capital  and  pollution  are  developed  for  several  different  types  of  pollution 
control.  Other  papers  of  this  type  include  Plourde  (1972),  d'Arge  and  Kogiku  (1973),  Smith  (1972), 
Plourde  and  Yeung  (1989)  and  Forster  (1973).  Cropper  (1976)  also  considers  such  a  model  of 
optimal  growth  but  focuses  on  catastrophic  environmental  effects—the  ultimate  in  irreversibilities. 
B.   Learning 

The  typical  approach  to  including  learning  in  models  of  irreversibility  is  to  posit  a  two  or  three 
period  model  where  uncertainty  changes  from  one  period  to  the  next.  Miller  and  Lad  [1984]  use  a 
two  period  model  with  an  ex  ante  probability  distribution  on  period  i  benefits  (bt)  of  f(b,,b-,).  After 
observing  period  one  benefits,  the  ex  post  marginal  distribution  is  obtained:  ["(b^b-,^,).  While  this 
is  clearly  learning,  we  need  a  way  to  parameterize  the  rate  of  learning  so  that  the  effects  of  the  rate 
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of  learning  can  be  deduced.    Jones  and  Ostroy  [1984],  Olson  [1990]  and  Marshak  and  Miyasawa 

[1968]  provide  such  a  framework  through  the  concept  of  an  ordering  on  information  structures. 

Starting  with  a  set  of  states  of  nature  and  an  informative  message,  an  information  structure  consists 

of  a  prior  on  the  probabilities  of  receiving  specific  messages,  along  with  a  conditional  probability  on 

states  of  nature,  given  a  specific  message.    Of  two  information  structures  with  the  same  prior  on 

states  of  nature,  the  one  that  has  the  greater  variability  in  terms  of  possible  posteriors  is  viewed  as 

being  "more  informative."    This  is  equivalent  to  the  more  informative  structure  yielding  a  higher 

attainable  expected  utility  when  the  consumption  bundle  depends  on  the  state  of  nature  (Jones  and 

Ostroy,  1984).  Thus  if  two  learning  processes  yield  two  comparable  information  structures,  then  the 

structure  that  is  more  informative  corresponds  to  greater  learning. 

To  quantify  this  concept  of  learning  further,  suppose  there  is  a  set  of  possible  states  of  nature, 

indexed  by  s=l,. .  .,S.  Furthermore,  suppose  there  is  a  finite  set,  Y,  of  possible  "messages"  containing 

information  on  the  state  of  nature.     Suppose  the  prior  on  receiving  particular  messages  is  q 

(dimension  equal  to  the  size  of  Y)  and  the  conditional  probability  on  states  of  nature  (after  the 

message  yeY  has  been  received)  is  7t(y).  We  use  the  term  "prior"  to  refer  to  a  probability  distribution 

before  the  message  is  received  and  posterior  to  refer  to  distributions  assuming  a  message  has  been 

received.  Let  II  be  a  matrix  with  columns  consisting  of  ir(y)  with  a  different  column  for  each  y.  Thus 

II  has  S  rows  and  the  same  number  of  columns  as  members  of  Y.   (H,q)  is  an  information  structure. 

A  first  goal  is  to  develop  an  economically  relevant  ordering  on  information  structures.   A  standard 

definition  of  the  comparative  value  of  information  is  provided  by  Jones  and  Ostroy  [1984]  (see  also 

Laffont,  1989): 

Defn.    Given  a  finite  set  A  of  actions  (chosen  after  the  message  y  becomes  known),  a  set  of  states  of 

nature  S  and  utility  u  defined  on  A  x  S,  (Hq)  is  more  valuable  than  (II',q')  if  for  all  bounded  utility 

functions, 


Ev  ^odl^W"^'5)]  ^Ev  ^maxa^[E^O0w(<7,s) 


(1) 


Thus  with  actions  chosen  after  the  state  of  nature  is  revealed,  an  information  structure  that  always 
has  a  higher  expected  utility  is  a  more  valuable  structure.  Blackwell's  theorem  connects  this  notion 
of  value  with  variability  in  beliefs: 

Theorem  (Blackwell,  1953):  Given  two  information  structures,  (H  q)  and  (W,  q'),  (U,  q)  is  more 
valuable  than  (W,  q')  if  there  exists  a  non-negative  matrix  M,  with  columns  summing  to  1  (i.e.,  a 
Markov  matrix)  such  that 

IT  =  UM  and  q   =  Mq'  (2) 

It  follows  that  for  (2)  to  hold,  priors  on  states  of  nature  must  be  the  same: 

n   =Uq  =  IIV-  (3) 

Using  the  terminology  preferred  by  Jones  and  Ostroy  [1984],  (II,  q)  is  the  structure  with  the  higher 
variability  in  beliefs  (i.e.,  variability  in  posteriors  on  states-of-nature).  Note  that  the  extremes  of 
variability  are  in  the  two  structures:  i)  jr(y)  =  rr  for  all  y  where  ir  is  the  prior  on  states  of  nature  and 
is  also  the  posterior,  no  matter  what  message  was  received;  ii)  rc(y)  =  e  where  e  is  a  vector  of  O's  and 
l's  and  implies  that  the  message  resolved  all  uncertainty.  The  first  of  these  has  the  minimum  amount 
of  variability,  the  second  the  maximum.  Jones  and  Ostroy  (1984)  establish  that  condition  (2)  defines 
a  partial  ordering  on  information  structures. 

A  restriction  on  the  set  of  comparable  (using  Blackwell's  theorem)  information  structures  that 
has  proved  useful  in  examining  learning  (Jones  and  Ostroy,  1984;  Olson,  1990)  is  the  set  of  star- 
shaped  beliefs: 

Defn:  The  information  structure  (Hq)  is  a  star-shaped  spreading  of  (II', q')  if  i)  Ik{  =  Wq'  =  k; 
ii)  q  —  q ';  and  Hi)  there  exists  0  <r  Xx  z  I  such  that 


n'(y)  =  Xyn(y)  +  (l-X.y)n  (4) 

Applying  Blackwell's  theorem  implies  that  in  the  above  definition,  (II,  q)  is  more  valuable  than  (II', 

q')- 

As  an  example  suppose  you  can  receive  one  of  three  messages  indicating  whether  the  state 

of  nature  is  1,  2  or  3.  We  thus  assume  that  number  of  possible  messages  equals  the  number  of 
possible  states-of-nature,  which  need  not  be  the  case.  A  message  that  conveyed  the  maximum 
amount  of  information  would  resolve  all  uncertainty  on  the  state  of  nature.  If  the  message  is  too 
noisy  to  contain  any  information,  then  the  posterior  on  states  of  nature  is  the  same  as  the  prior.  This 
is  illustrated  in  Figure  1  where  the  simplex  of  probabilities  on  states  of  nature  is  shown.  The  prior 
is  it.  The  set  of  posteriors  associated  with  a  star-shaped  spreading  of  beliefs,  spread  all  the  way  out 
to  the  vertices,  is  shown  by  the  three  lines  radiating  out  from  i.  Perfect  learning  would  move  you 
to  one  of  the  three  vertices  following  receipt  of  the  message.  Less  perfect  learning  would  move  you 
to  one  of  the  three  points  marked  with  circles.  Even  less  perfect  learning  would  move  you  to  one 
of  the  three  points  marked  with  x's  after  receiving  the  message. 

The  advantage  of  representing  learning  by  this  star-shaped  spreading  of  beliefs  is  that  the 
process  can  be  parameterized  by  the  Xy  in  equation  (4).  The  disadvantage  is  that  we  have  eliminated 
perfectly  legitimate  and  orderable  learning  processes  (emanating  from  k  in  Figure  1). 

III.   UNCERTAIN  POLLUTION  DAMAGE 

The  purpose  of  this  section  is  to  develop  a  stochastic  model  of  the  joint  generation  of 
pollution  and  consumption  goods.  Pollution  accumulates.  The  basic  issue  of  concern  is  how  much 
pollution  to  emit  today  and  how  those  optimal  emissions  are  affected  by  the  rate  of  learning.  Today's 
emissions  can  be  reduced  but  at  a  cost  in  terms  of  today's  consumption.     We  focus  on  the 
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comparative  statics  of  optimal  current-period  emissions  with  respect  to  the  rate  of  learning. 

Decisions  on  emissions  are  made  at  discrete  points  in  time.  Utility  is  affected  by  consumption, 
which  is  influenced  by  emissions,  and  the  stock  of  pollution.  The  stock  of  pollution  evolves  through 
emissions  augmenting  the  stock.  The  way  in  which  pollution  enters  the  utility  function  is  uncertain, 
and  the  decision-maker  has  a  prior  on  this  uncertainty.  Furthermore,  this  prior  is  updated  over  time 
through  learning.  Thus  the  decision  maker  is  always  faced  with  the  decision  to  emit  now  and  enjoy 
the  resulting  consumption,  but  perhaps  create  future  disutility,  versus  reducing  emissions  now, 
foregoing  consumption  but  enjoying  lower  stocks  of  pollution  later.  The  thinking  is  that  if,  later  on, 
pollution  turns  out  to  be  less  serious,  emissions  can  always  be  increased. 

We  will  consider  two  cases.  One  involves  emissions  being  proportional  to  consumption. 
Emissions  can  be  reduced  by  reducing  consumption,  diverting  output  to  control  or  emitting  less  by 
producing  less.  The  other  case  requires  investment  in  emission  control  capital.  Production  must  be 
diverted  when  the  initial  investment  in  control  capital  is  made;  however  the  control  capital  can  then 
be  costlessly  used  in  subsequent  periods  to  control  pollution.  If  later  on  pollution  turns  out  to  be 
less  of  a  problem,  there  may  be  excess  pollution  control  capital.  First  we  set  up  the  general  model. 
A.   The  General  Model 

Let  the  representative  consumer's  utility  per  unit  time  be  a  function  of  the  stock  of  the 
externality,  P,  and  consumption,  C.  We  thus  make  utility  a  function  of  P  and  C:  U(P,C).  Assume 
U  is  convex,  Uc  is  positive  and  UP,  UCP,  UPP  and  Ucc  are  negative3.  Thus  P  is  a  bad  and  C  is  a 
good.  Let  emissions  be  E.  The  relationship  between  emissions  and  consumption  will  be  detailed  later. 
The  stock  of  the  externality  evolves  according  to 

AP  =  rE  (5) 


3Negativity  of  UCP  is  equivalent  to  having  higher  marginal  disutility  of  pollution  at  higher 
consumption  levels.   Thus  the  bigger  a  consumer  you  are,  the  more  "disrupting"  is  pollution. 
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Clearly  r>0.  Note  that  there  is  no  natural  decay  of  P,  although  that  would  be  easy  to  include.  Some 
variables  in  our  model  will  be  stochastic,  taking  on  discrete  values  depending  on  a  realization  of  the 
state-of-nature,  which  is  unknown.  Let  Tt(t)  be  a  vector  of  probabilities,  (nv.  .  .,tcs),  with  ^(t)  being 
the  probability  at  t  that  the  state-of-nature  is  i.  Obviously  2  ic{— 1.  In  our  case,  we  are  unsure 
exactly  how  the  pollution  stock  affects  utility.  We  represent  this  uncertainty  by  writing  utility  as 
U(6|P,C)  where  6;  depends  on  the  realized  state-of-nature.    Assume  i>j  =»  6L> 6:>0. 

We  now  turn  to  characterize  the  learning  process.  We  assume  that  the  set  of  possible 
messages,  Y,  is  of  the  same  size  as  the  set  of  possible  states-of-nature.  Each  message,  y,  e  Y,  is  a 
noisy  message  that  the  state  of  nature  is  i.  The  less  noisy  ys  is,  the  more  certainty  there  will  be,  ex 
post  (ie,  ex  post  the  message),  that  the  state  of  nature  is  i.  Furthermore,  we  assume  the  probability, 
q,  associated  with  the  vector  y  is  the  same  as  the  prior  on  the  states  of  nature.  Thus  if  one's  prior 
on  state  i  is  tEs  (i.e.,  the  probability  that  the  state  of  nature  is  i),  then  the  probability  of  receiving  y; 
is  also  i  ;:  q  =  k.  For  example,  consider  the  case  of  global  warming  where  there  is  uncertainty  over 
just  how  serious  the  problem  is.  Suppose  there  are  two  states  of  nature,  B  and  L,  for  global  warming 
being  a  big  problem  and  global  warming  being  a  little  problem.  Supple  the  prior  one  states  is  (p,  1- 
p).  Our  assumption  is  that  learning  (R&D)  may  result  in  one  of  two  messages,  one  suggesting  B,  the 
other  suggesting  L.  And  the  probability  that  learning  will  suggest  B  is  p  and  is  the  probability  that 
learning  suggests  L  is  1-p. 

Furthermore,  we  characterize  learning  as  a  star-shaped  spreading  of  beliefs  with  learning 
parameter  X.  Let  (II,  q)  be  the  information  structure  at  t.  If  ir(t)  =  Ilq  =  q  is  a  prior  on  states  of 
nature  at  time  t,  then  learning  can  move  in  S  possible  directions  (see  Figure  1).  The  ith  column  of 

n  is 

7i'  =  (l-X)^(r)  +  A^' 


-  A.7i=[e'-7c(0]A  (6) 

where  e1  is  a  vector  of  zeros  except  with  a  one  in  the  ilh  position.  Another  way  of  viewing  the  vector 
k1  in  (6)  is  as  the  posterior  on  states  of  nature,  given  that  learning  proceded  in  direction  i.  With 
reference  to  Fig.  1,  this  might  correspond  to  point  D.  With  A.=0,  no  learning  occurs.  With  X  =  l, 
movement  to  the  vertices  of  the  simplex  (perfect  information)  occurs.4   Thus  Ae[0,l]. 

It  should  be  mentioned  that  this  description  of  learning,  while  intuitively  attractive,  is  quite 
restrictive.  Not  only  have  we  restricted  the  set  of  messages  but  the  way  in  which  learning  modifies 
probabilities  on  states  of  nature.  Despite  this,  the  characterization  seems  fairly  realistic  in  that  a  prior 
on  states  of  nature  "moves"  at  some  speed  (X)  towards  perfect  knowledge-the  vertices  of  the  simplex. 

We  now  turn  to  specifying  the  dynamic  structure  of  the  problem.  Assume  there  are  three 
time  periods,  1,  2,  and  3.  Decisions  are  made  in  time  periods  1  and  2.  In  time  period  1,  the  choice 
must  be  made  regarding  emissions,  E,  given  a  prior  probability  distribution  on  states-of-nature.  After 
that  choice  is  made  learning  occurs  and  a  posterior  probability  distribution  on  states-of-nature  result. 
The  stock  also  evolves.  In  time  period  2  another  choice  is  made  regarding  emissions.  Then  the  stock 
of  pollution  evolves  again.  Utility  in  the  third  time  period  is  a  function  of  the  stock  of  pollution  only 
V(P).  This  function,  V(P),  can  be  interpreted  in  two  ways.  One  is  that  the  world  has  three  periods 
and  nothing  occurs  after  the  third  period.  The  other  is  that  V(P)  represents  the  net  present  value 
of  maximum  attainable  utility  over  the  indefinite  future,  starting  with  pollution  stock  P.  Either  way, 
our  model  is  the  same.  In  the  second  case,  V(P)  depends  on  U(P,C)  and  in  principal  can  be 
computed  from  U(P,C).   Similar  to  U,  assume  VP<0,  Vpp<0.   Thus  V  is  convex. 

Denote  by  H(n,P)  the  maximum  utility  attainable  in  period  2,  given  (rc,P)  at  the  beginning 


4The  information  structure  associated  with  the  transition  is  (II,q)  where  k  =q  =  ITq,  ITI]  =  7t1-t-(  l-7i,)A. 
and  H^  =  7Ti(l -X)  for  i^j. 
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of  period  2.     Denote  by  J(7t,P)  the  maximum  utility  attainable  in  period  1,  given  (tt,P)  at  the 
beginning  of  period  1.    Let  the  period  1-2  discount  rate  be  p.   We  now  consider  our  two  cases. 
B.   No  Stock  Effect  in  Pollution  Control 

Emissions  will  be  assumed  proportional  to  consumption.  Emissions  can  be  reduced,  but  only 
by  reducing  consumption.  Without  loss  of  generality,  we  assume  C  =  E.  We  can  now  write  H(P,tc), 
the  maximium  utility  attainable  in  period  2,  as 

H(P,n)  =max j£  nW(bt,P,E2)\  (7) 


where 


W(5PP,E2)  =  U{&J>,E2)  *  V[6,(P*rE2)] 


Similarly,  we  can  write  the  maximum  utility  attainable  in  period  1,  in  terms  of  H: 

J(n,P,X)=maxjj(ii1P,X,El)=Y,  ^(S^^  +  PE  qjH[P+rEvn^ 


where  Tij  =  Tl  +  X(eJ  -  ll) 

i 

Although  by  assumption  q=Tt,  for  clarity  we  have  retained  both  variables  to  distinguish  between  the 
events  "being  in  state  of  nature  i"  and  "learning  progressing  in  the  direction  of  d."  Of  course 
EiTri  =  Ziclj=l-   Clearly  E2*(P,k)  satisfies 
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£  7i:^£(S,,P,£2V0.  (9) 

i 

The  first  order  condition  for  an  optimal  E/  is 

Je(k,P,X,E;)=0  (10) 

Totally  differentiating  this  with  respect  to  X  and  E,  yields 

JEXdX+JEEdEl=0 


dEl^_JEX 

dX       j 

JEE 


Furthermore, 


(11) 


From  (8)  it  is  clear  that 

JEE  (7r,P,X,^)=E  7llUcc  +  Pr'E  qjHpp[P+rEltnq  (12) 


/7p/,(P,t0=£  Tri.[^)/)(6t,P)JrI2)  +  ^£(5i,JP,£2)-^]  (13) 


which  can  be  shown  to  be  negative  by  totally  differentiating  (9)  or  through  convexity  arguments. 
Thus  JEE<0.    Differentiating  eqn  (8)  we  obtain 

JEX  (7i,P,A,£;>Pr£<7,X  (fil-ntH^lP  +  rEirt.  (14) 

i  l<s 
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But  from  (7), 


Hp[P,n]  =  £  n.Wpib^E^n)]  05) 


which  implies  (for  /<s) 


HPK  -Wp[bpPX2\  +  ^-£  ^£[S,P,£2*].  (16) 


But  from  (9), 


dE2  KlWJi6pP,E£ 


d7l<  E^WEE^i'P>^ 


Combining  (16)  and  (17)  yields 


£  *,W,Jii,P.G\ 


(17) 


HPn  IP, k]  =  Wp[bt,P,E;]  -  «,Wy8p/»,E,*]  ~ •  <18> 

E  kFeJ&pP.e;] 


This  can  be  combined  with  (14)  to  yield  JEv  Unfortunately,  it  is  not  easy  to  sign  this.  One 
complication  is  that  third  order  derivatives  of  W  may  be  needed.  We  can  derive  results  to  a  second- 
order  approximation;  i.e.  assuming  derivatives  of  the  utility  functions  of  third  order  and  higher  are 
zero. 
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Prop.  1    In  a  two  state  model  with  uncertainty  in  damage  from  the  pollution  stock,  an  increase  in  the 

rate  of  learning  decreases  cuirent  period  emissions,  to  a  second  order  approximation  of  utility. 
Proof:   It  is  clear  from  eqn  (11)  that  the  sign  of  dE,/dX  is  the  same  as  the  sign  of  JEv   We  thus  need 
to  show  JEA  is  negative.   In  our  case  JEA  can  be  written  with  (ji,  1-tt)  as  the  probabilities  for  the  two 
states: 


=  Prn(l-n){HpJP+rE;,nl]-HPv[P  +  rE*uiz2]} 


(19) 


whose  sign  is  the  same  as  the  term  in  braces.  To  sign  this,  define  a  =  (-l,l)  and  f(ct)  =  HP7r[P,Tr  +  cca] 
with  P  =  P  +  rE!*  and  a  e  R+.  Since  rc1  =  [n  +  X(l-iz),  (1-X)  (1-*)]  and  ji2  =  [(I-X)ti,  (1-rc)  +  Xn], 
clearly  f(0)-f(X)  is  the  term  in  braces  in  (19).  If  f'(cc)  is  positive  for  all  a,  ()<.a^X<.\,  the  proposition 
will  be  proved.   Differentiating  f(a)  we  obtain  (remembering  that  tij  +  tc^I)5: 


/(<*)  = 


df         df    $E2 


dn 


dE' 


dn 


dn 


da 


WpE-{n\-a)Wl 


PE 


(n\-a)WJ6l9P,EZl 


W, 


EE 


w, 


PE 


w, 


WE[bvPyE£  {-1} 


EE 


5Note  that  when  a  changes,  n  changes;  when  n  changes,  E2*  changes;  when  E2*  changes  slightly, 
H  and  W  are  unaffected  (by  the  envelope  theorem)  but  H,»n,  WP  and  WE  (and  thus  f)  do  change. 
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w 

-^{[(l-X)(l-iz)  +  a][n[-a]  +  l}WE[6vP,E;]  ^ 

™EE 


In  the  above,  our  assumption  on  a  second-order  approximation  of  utility  means  third-order  derivatives 
of  W  are  zero.  By  assumptions  on  U  and  V  and  the  definition  of  W,  we  know  WE(5<0.  From  (9) 
and  the  fact  that  WEiJ<0,  we  have  WL[61,P,E2*]>0.   Thus  f'(a)>0  and  the  proposition  is  proved.J 

The  interpretation  of  this  proposition  is  straightforward.  If  you  are  uncertain  about  how 
pollution  affects  your  utility  but  will  know  better  tomorrow,  then  it  is  better  to  under-emit  today, 
perhaps  correcting  your  "mistake"  tomorrow. 

It  is  appropriate  to  compare  this  result  to  the  results  of  Arrow  and  Fisher  (1974).  Their 
results  indicate  that  when  A.>0,  there  should  be  more  of  an  effort  to  avoid  irreversibilities  than  when 
X=0:  dE/dA.<0.   This  is,  in  fact,  our  result. 

The  policy  implications  of  this  result  are  quite  significant.  Emissions  can  either  be  directly 
controlled  by  reducing  their  level  or  indirectly  controlled  by  increasing  X  through  investment  in  R&D. 
This  result  does  not  support  the  view  that  if  learning  is  progressing  rapidly  then  controls  should  be 
deferred  until  more  is  known.  Learning  and  pollution  control  are  complements,  not  substitutes. 
Another  way  to  look  at  this  is  the  less  you  emit,  the  greater  the  value  of  the  information  you  receive. 
Thus  receiving  greater  amounts  of  information  enhances  the  value  of  undcremitting. 
C.    Investment  in  Emission  Control  Capital 

In  the  last  section  we  saw  that  learning  decreases  current  emissions  because  of  the  stock 
nature  of  the  externality.  We  now  turn  to  the  case  where  there  is  a  type  of  irreversibility  in  pollution 
control.  Once  you  invest  in  pollution  control  capital,  that  capital  is  useful  for  all  time.  However,  it 
is  useful  only  for  pollution  control  and  cannot  later  be  "un-invested."  This  may  be  a  bit  extreme  to 
assume  control  capital  does  not  depreciate.     However,  it  is  a  useful  approximation,  and  is  not 
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unreasonable  for  some  investments  like  R&D  in  pollution  control. 

As  in  the  model  developed  in  the  last  section  of  the  paper,  net  emissions  cause  the  pollution 

stock  to  evolve,  according  to  eqn  (5).    The  main  difference  is  that  instead  of  emissions  being  the 

decision  variable,  investment  in  emission  control  capital  (I)  is  the  decision  variable.  This  investment 

is  costly  but  infinitely  lived.    Net  emission  are  defined  as  E0-K  where  K  is  the  stock  of  emission 

control  capital  and  E0  is  gross  emission  (before  control)  and  is  time  invariant.    Thus  the  pollution 

stock  evolves  as 

I   r(E0-K)     for    K,E  (2l) 

{  0  otherwise 

Utility  is  as  before  a  function  of  consumption  and  the  stock  of  pollution,  U(6iP,C),  where  6 
is  state-dependent.  If  Y0  is  gross  output,  then  C= Y0-I.  Since  Y0  only  enters  the  utility  function  and 
is  time  invariant,  without  loss  of  generality  we  can  define  C^-I. 

Thus  eqn  (7)  can  be  rewritten  as 

#(/>,*,*)  =max|£  nW(diyP,K,I2)\  (22) 


where 


W(b,,P,KJ2)  =  U(biP,-l2)*V(bl[P*r(E0-K-I2)}) 


Note  that  W  is  convex  in  (P,K,I2).   Thus  H  is  convex  in  (P,K)6.   Eqn  (8)  can  be  rewritten  as 


6If  g(z,a)  is  convex  in  (z,a)  then  g*(a)  =  max,  g(z,a)  is  convex  in  a  (Mangasarian  and  Rosen 
1964).  " 
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J(n,P,K,X)  =max|/(7i:,P,iW1) s£  *,-W^  "A) 


+  p£  ^p  +  K^-^-z^Ar+^^i  (23) 


where  7T;  =  IT  +  A(ey  -  7l) 

i 

Clearly  I2*(P,k,Tr)  satisfies 

Y^TZ.Wfb^Kj;)^.  (24) 

i 

The  first  order  condition  for  an  optimal  I,*  is 

7/(Tt,P,ife,X,/1*)=0  (25) 

Totally  differentiating  this  with  respect  to  X  and  I,  yields 

J^dX+Jjjdl^O 


5l  =  -^ 


(26) 


Jn  is  negative  since  J  is  the  sum  of  two  convex  functions  (U  and  H)  and  it  is  thus  easily  shown  that 
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J  is  convex  in  I.   Differentiating  eqn  (23)  we  obtain 


3i—Y.*iuc-^L9frE,-B^  (27) 


and 


-J5T&|[P  +  r(£0  -  *  -  I[\K  +  /;,  it^j .  (28) 

From  (22)  we  have 

Hjn[(P,K,n)  =  Wji6pP9K,l£+^yE*iWljL6pP,K,l£    for  j=P,K.      (29) 


And  from  (24), 


di;  _     npjL^p.Kj;]  m 

d%l       £  niWij.6pP,K,l£ 


Eqn  (28-30)  can  be  combined  to  yield  an  expression  for  JIA.    As  before,  it  is  not  easy  to  sign. 
However,  results  are  obtainable  for  a  restricted  case: 

Prop  2:  In  a  two  state  model  with  uncertainty  in  damage  from  the  pollution  stock  and  emission  control 
through  a  stock  of  control  capital,  then  the  effect  of  learning  on  current  period  investment  in  control 
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capital  is  ambiguous.  Assume  a  second-order  approximation  of  utility.   If  at  an  optimum  the  marginal 

value  of  control  capital  in  future  periods  increases  sufficiently  rapidly  in  the  pollution  stock,  then  it  will 

be  optimal  to  under-invest  in  control  capital.  Alternatively,  if  that  marginal  value  is  sufficiently  small, 

it  will  be  desirable  to  over  control. 

Proof:   The  sign  of  dI,7cU  is  the  same  as  the  sign  of  J1A  (from  eqn  26).   For  the  two  states-of-the- 

world  case,  eqn  (28)  becomes 

Jlx  =  -^rii(\-n){rHp^P,k^l]-HK\:PX^] 


-rHp7i[PXn2l+HKii[PXn2]}. 


(31) 


As  in  Prop.  1,  define  a=(-l,l)  and  f(a)  =  rH1,7r[P,K,7i'  +  aa]  -  H^fP.KV  +  cca]  with  a  e  R+, 
P  =  P  +  r(E0-K-Ij*)  and  K=K+I,\  Since  f(0)-f(A.)  is  the  term  in  braces  in  eqn.  (31),  f '(a)  positive  for 
all  a,  O^a^A.^1  implies  that  dI,7dX  is  positive  and  similarly  f'(a)  negative  implies  dI,*/dX  is  negative. 
Differentiating  f(a),  using  eqn  (29),  yields: 


f'W  = 


dl 


^rli+Jf 


dL 


diz     dn 


dn 


da 


&; 


dn 


rWfl  -  W„-  (it  |  -  a)  (rWpp  -  WpK)  * 


jrWpp-WpK) 
(itl-a) 


9£ 

dn 


[rWPI-Wri]+[rWpp-WpJ 


PP      "  PW 


1  M 

1  /     1  \ 

-(Tii -a) 


7ij  -  a 


') 


19 


dn 


<MV1  +M 


lKl 


KP 


(7i[-a) 


(32) 


where         M{b  VP,KJVI2)  =  W{bvP  +  r(EQ-  K  -  I^K  +  IVI2) 


and  where  our  assumption  on  a  second-order  approximation  to  utility  allows  us  to  set  third-order 
derivatives  to  zero.  Because  WIi5>0,  then  WI[S1,P,K,I2*]<0  and  thus  (from  eqn  30),  dl2*/d7i<0. 
MK  =  -rWP  +  WK  is  the  marginal  value  to  the  future  of  a  unit  of  control  capital  today.  M^  is 
negative  and  M^  is  positive.  The  expression  with  probabilities  in  brackets  in  eqn  (32)  is  positive. 
If  MH  dominates  dI17dX>0;  if  M^  dominates,  dI1*/clX<0.  Thus  dl^/dX  can  be  made  positive  by 
making  M,^  sufficiently  small  and  negative  by  making  M^  sufficiently  large.J 

To  understand  this  proposition,  it  is  important  to  understand  the  mechanism  whereby  learning 
acquires  value.  Learning  resolves  uncertainty  in  the  effect  of  pollution  on  future  utility. 
Alternatively,  think  of  learning  as  resolving  the  uncertainty  in  the  benefits  of  pollution  control  which 
are,  of  course,  directly  related  to  future  pollution  levels.  The  higher  the  marginal  benefits  of 
pollution  control,  the  greater  the  potential  value  of  information  about  it.  Thus  at  an  optimum,  an 
increase  in  the  rate  of  learning  will  induce  actions  that  increase  the  marginal  benefit  of  pollution 
control.  This  is  why  in  Prop.  1,  increased  learning  induced  reduced  emissions  since  lower  emissions 
result  in  lower  pollution  stocks  and  thus  a  higher  marginal  utility  of  emissions  control. 

Similarly,  here  the  value  of  emissions  control  is  what  yields  the  value  of  learning.  Thus  if  one 
increases  the  rate  of  learning,  one  would  expect  a  marginal  change  in  emission  control  that  yields  an 
increase  in  the  marginal  value  of  emission  contorl.  MK  is  the  marginal  value  of  a  unit  of  emission 
control  capital.    MK  is  increasing  in  P  and  decreasing  in  I2.    The  greater  the  pollution  stock,  the 
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greater  the  marginal  payoff  from  emissions  control.   Also,  the  greater  second  period  investment  in 
pollution  control,  the  lower  the  marginal  value  of  current  period  investments  in  control.    If  M,^ 
dominates,  then  increasing  P  increases  the  value  of  learning  so  higher  rates  of  learning  call  for  more 
P  and  thus  less  I,.   The  reverse  is  true  if  M^  dominates. 

IV.   CONCLUSIONS 

This  paper  has  explored  the  implications  of  learning  on  optimal  control  of  emissions  of  a  stock 
externality.  The  issue  is  significant  theoretically  as  well  as  of  utmost  importance  empirically, 
particularly  for  the  case  of  global  warming. 

There  is  no  clear  accepted  wisdom  on  this  type  of  problem.  One  view  is  that  because 
pollution  is  accumulating  we  should  tend  to  over-control  now,  while  we  arc  learning.  The  risks  of 
getting  too  much  pollution  outweigh  the  risks  of  spending  too  much  on  control.  This  position  hinges 
less  on  the  fact  that  learning  is  taking  place  than  on  the  fact  that  there  is  a  reversibility.  The 
alternative  view  is  that  because  we  are  learning  rapidly,  we  may  as  well  put  control  off  until 
tommorrow  when  we  will  know  much  more  about  the  problem. 

There  arc  two  results  in  this  paper.  One  is  that  when  emissions  control  is  perfectly  reversible 
(no  sunk  capital-control  levels  can  be  raised  or  lower  at  any  time),  then  the  fact  that  one  is  learning 
will  cause  the  optimal  level  of  current  emissions  to  be  lower.  Because  of  the  stock  effects  of 
pollution,  it  is  in  fact  prudent  to  err  on  the  side  of  under-emission. 

However,  the  result  is  more  ambiguous  when  there  is  a  stock  effect  both  in  pollution 
accumulation  and  in  emission  control,  through  irreversible  investments  in  control  capital.  The 
direction  of  bias  in  today's  emission  control  capital  investment  decision  depends  on  how  learning  will 
effect,  ex  post,  the  value  of  an  investment  in  control  capital.  If  learning  may  result  in  dramatically 
larger  or  smaller  marginal  values  for  a  unit  of  control  capital  investment,  then  one  would  be  reducing 
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the  value  of  learning  by  excessively  investing  in  control  today.   This  leads  to  under-control  relative 

to  the  no  learning  case.    On  the  other  hand,  if  learning  has  less  payoff  in  terms  of  the  value  of 

control  capital  investment,  then  the  irreversibility  in  pollution  dominates  and  we  have  our  first  result. 

Clearly,  there  are  a  number  of  possible  future  directions  for  this  work,  including  making  the 

results  more  general  and  adding  different  forms  of  uncertainty.  Of  course,  definitive  answers  to  these 

questions  for  specific  policy  applications  require  an  empirical  implementation  of  these  models, 

involving  more  structure  and  data  than  are  found  here.    Hopefully  this  work  will  motivate  such 

empirical  investigations. 
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^igure  1.    Star-shaped  spreading  of  beliefs  from  it. 
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